SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Lu Zhonghai);mspu:(licentiatethesis)"

Sökning: db:Swepub > Lu Zhonghai > Licentiatavhandling

  • Resultat 1-3 av 3
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Liu, Ming, 1982- (författare)
  • A High-end Reconfigurable Computation Platform for Particle Physics Experiments
  • 2008
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Modern nuclear and particle physics experiments run at a very high reaction rate and are able to deliver a data rate of up to hundred GBytes/s.  This data rate is far beyond the storage and on-line analysis capability. Fortunately physicists have only interest in a very small proportion among the huge amounts of data. Therefore in order to select the interesting data and reject the background by sophisticated pattern recognition processing, it is essential to realize an efficient data acquisition and trigger system which results in a reduced data rate by several orders of magnitude. Motivated by the requirements from multiple experiment applications, we are developing a high-end reconfigurable computation platform for data acquisition and triggering. The system consists of a scalable number of compute nodes, which are fully interconnected by high-speed communication channels. Each compute node features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytesDDR2 memory. A hardware/software co-design approach is proposed to develop custom applications on the platform, partitioning performance-critical calculation to the FPGA hardware fabric while leaving flexible and slow controls to the embedded CPU plus the operating system. The system is expected to be high-performance and general-purpose for various applications especially in the physics experiment domain. As a case study, the particle track reconstruction algorithm for HADES has been developed and implemented on the computation platform in the format of processing engines. The Tracking Processing Unit (TPU) recognizes peak bins on the projection plane and reconstructs particle tracks in realtime. Implementation results demonstrate its acceptable resource utilization and the feasibility to implement the module together with the sys-tem design on the FPGA. Experimental results show that the online track reconstruction computation achieves 10.8 - 24.3 times performance acceleration per TPU module when compared to the software solution on a Xeon2.4 GHz commodity server.
  •  
2.
  • Lu, Zhonghai (författare)
  • Using wormhole switching for networks on chip : feasibility analysis and microarchitecture adaptation
  • 2005
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Network-on-Chip (NoC) is proposed as a systematic approach to address future System-on-Chip (SoC) design difficulties. Due to its good performance and small buffering requirement, wormhole switching is being considered as a main network flow control mechanism for on-chip networks. Wormhole switching for NoCs is challenging from NoC application design and switch complexity reduction. In a NoC design flow, mapping an application onto the network should conduct a feasibility analysis in order to determine whether the messages’ timing constraints can be satisfied, and whether the network can be efficiently utilized. This is necessary because network contentions lead to nondeterministic behavior in message delivery. For wormhole-switched networks, we have formulated a contention tree model to accurately capture network contentions and reflect the concurrent use of links. Based on this model, the timing bounds of real-time messages can be derived. Furthermore, we have developed an algorithm to test the feasibility of real-time messages in the networks. From the wormhole switch micro-architecture level, switch complexity should be minimized to reduce cost but with reasonable performance penalty. We have investigated the flit admission and flit ejection problems that concern how the flits of packets are admitted into and ejected from the network, respectively. For flit admission, we propose a novel coupling scheme which binds a flit-admission queue with an output physical channel. Our results show that this scheme achieves a reduction of up to 8% in switch area and up to 35% in switch power over other comparable solutions. For flit ejection, we propose a p-sink model which differs from a typical ideal ejection model in that it uses only p flit sinks to eject flits instead of p • v flit sinks as required by the ideal model, where p is the number of physical channels of a switch and v is the number of virtual channels per physical channel. With this model, the buffering cost of flit sinks only depends on p, i.e., is irrespective of v. We have evaluated the coupled flit-admission technique and p-sink model in a 2D 4 x 4 mesh network. In our experiments, they exhibit only limited performance penalties in some cases. We believe that these cost-effective models are promising candidates to be used in wormhole-switched on-chip networks.
  •  
3.
  • Wang, Boqian, 1990- (författare)
  • High-Performance Network-on-Chip Design for Many-Core Processors
  • 2020
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • With the development of on-chip manufacturing technologies and the requirements of high-performance computing, the core count is growing quickly in Chip Multi/Many-core Processors (CMPs) and Multiprocessor System-on-Chip (MPSoC) to support larger scale parallel execution. Network-on-Chip (NoC) has become the de facto solution for CMPs and MPSoCs in addressing the communication challenge. In the thesis, we tackle a few key problems facing high-performance NoC designs.For general-purpose CMPs, we encompass a full system perspective to design high-performance NoC for multi-threaded programs. By exploring the cache coherence under the whole system scenario, we present a smart communication service called Advance Virtual Channel Reservation (AVCR) to provide a highway to target packets, which can greatly reduce their contention delay in NoC. AVCR takes advantage of the fact that we can know or predict the destination of some packets ahead of their arrival at the Network Interface (NI). Exploiting the time interval before a packet is ready, AVCR establishes an end-to-end highway from the source NI to the destination NI. This highway is built up by reserving the Virtual Channel (VC) resources ahead of the target packet transmission and offering priority service to flits in the reserved VC in the wormhole router, which can avoid the target packets’ VC allocation and switch arbitration delay. Besides, we also propose an admission control method in NoC with a centralized Artificial Neural Network (ANN) admission controller, which can improve system performance by predicting the most appropriate injection rate of each node using the network performance information. In the online control process, a data preprocessing unit is applied to simplify the ANN architecture and make the prediction results more accurate. Based on the preprocessed information, the ANN predictor determines the control strategy and broadcasts it to each node where the admission control will be applied.For application-specific MPSoCs, we focus on developing high-performance NoC and NI compatible with the common AMBA AXI4 interconnect protocol. To offer the possibility of utilizing the AXI4 based processors and peripherals in the on-chip network based system, we propose a whole system architecture solution to make the AXI4 protocol compatible with the NoC based communication interconnect in the many-core system. Due to possible out-of-order transmission in the NoC interconnect, which conflicts with the ordering requirements specified by the AXI4 protocol, in the first place, we especially focus on the design of the transaction ordering units, realizing a high-performance and low cost solution to the ordering requirements. The microarchitectures and the functionalities of the transaction ordering units are also described and explained in detail for ease of implementation. Then, we focus on the NI and the Quality of Service (QoS) support in NoC. In our design, the NI is proposed to make the NoC architecture independent from the AXI4 protocol via message format conversion between the AXI4 signal format and the packet format, offering high flexibility to the NoC design. The NoC based communication architecture is designed to support high-performance multiple QoS schemes. The NoC system contains Time Division Multiplexing (TDM) and VC subnetworks to apply multiple QoS schemes to AXI4 signals with different QoS tags and the NI is responsible for traffic distribution between two subnetworks. Besides, a QoS inheritance mechanism is applied in the slave-side NI to support QoS during packets’ round-trip transfer in NoC.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-3 av 3

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy